Submitted to the Statistical Science 



Measurement error in GLMMs 
with I NLA 

Stefanie MufP, Andrea Riebler*, Havard Rue, Philippe 
Saner, and Leonhard Held 



O 

(N 



Abstract. Measurement error (ME) theory has largely been neglected in 
^ many applied sciences, although central variables are often difficult to 

measure and may thus contain considerable error. To account for ME 
in explanatory variables, Bayesian approaches provide a flexible frame- 
work. However, given the analytic intractability of the posterior distri- 
bution, model inference so far has to be performed via time-consuming 
P-h and complex Markov chain Monte Carlo implementations. In this paper 

^ we extend the Integrated nested Laplace approximations (INLA) ap- 

proach to formulate Gaussian ME models in generalized linear mixed 

, , models. We present three applications, and illustrate how parameter 

P_] estimates are obtained for common ME models, such as the classical 

and Berkson error model including heteroscedastic variances. We con- 
, ' elude that the solution presented here provides an attractive alternative 

to likelihood-based approaches and hope it will stimulate the greater 
, use of Bayesian methods for ME modelling. To illustrate the practical 

feasibility, R-code is provided. 

J> Key words and phrases: Bayesian analysis, Berkson error, Classical er- 

ror, Integrated nested Laplace approximation, Measurement error. 
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1. INTRODUCTION 

The existence and the effects of measurement error (ME) in statistical analy- 
ses have been recognized and discussed for more than a century, see for example 
Pearson (1902); Wald (1940); Berkson (1950); Fuller (1987); Carroll et al. (2006). 
.^h The sources of ME are manifold and imply much more than just instrumental 

imprecision in the measurement of physical variables, such as length, weight etc., 
but may include for instance biases due to preferential sampling, incomplete ob- 
servations or misclassification. 

Unless the ME is negligible, modeling it is crucial. If ME is ignored, parameter 
estimates and confidence intervals in statistical models often suffer from serious 
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biases. If a regression model is multivariate and some covariates can be measured 
with and some without error, even the effects of the error-free measured covari- 
ates can be biased, where the direction of the bias depends on the correlation 
among covariates (Carroll, Gallo and Gleser, 1985; Gleser, Carroll and Gallo, 
1987). Moreover, ignoring ME may cause a loss of power for detecting signals 
and connections among variables, and may mask important features of the data. 
Given these facts, it is surprising that ME is often completely ignored or not 
treated properly. One reason might be that the statistical standard literature of- 
ten pays very little attention to this aspect, although the problems have been 
recognized for a long time. 

For successful error-correction both the amount of error (i.e., the error variance) 
and the error model need to be specified correctly. Hence, information about the 
underlying measurement process is essential. Possible errors must be identified 
early in a study and the entire data-collection process should be driven by such 
considerations. In the last decades, several approaches to model and correct for 
ME have been proposed, such as method-of-moments corrections (Fuller, 1987), 
simulation extrapolation (SIMEX) (Cook and Stefanski, 1994), regression calibra- 
tion (Carroll and Stefanski, 1990; Gleser, 1990), or Bayesian analyses (Clayton, 
1992; Stephens and Dellaportas, 1992; Richardson and Gilks, 1993; Dellaportas 
and Stephens, 1995; Gustafson, 2004). A thorough overview of current state-of- 
the-art methods is given in the books of Carroll et al. (2006) and Buonaccorsi 
(2010). 

In this paper, we focus on Bayesian approaches, as they provide a flexible 
framework and are thus well suited to treat ME. One of the main advantages 
of is that prior knowledge, and in particular prior uncertainty, e.g., in variance 
estimates, can be incorporated in the model. While frequentist approaches require 
to fix one or several parameters to guarantee identifiability, the Bayesian setting 
allows to represent uncertainty with suitable prior distributions. Note that the 
fixation of parameters in a frequentist setting corresponds to a Bayesian point 
prior. 

Up to now, posterior marginal distributions of such errors-in-variables mod- 
els have been estimated by employing a Markov chain Monte Carlo (MCMC) 
sampler, see for example Stephens and Dellaportas (1992) or Richardson and 
Gilks (1993). However, case-specific implementation may be challenging, MCMC 
is time-consuming and its analysis and interpretation requires diagnostic tools. 
Generic software like WinBugs (Lunn et al., 2000), OpenBugs (Lunn et al., 2009), 
or MCMC samplers in R, such as MCMCpack (Martin, Quinn and Park, 2011) 
or LaplacesDemon (Statisticat and LLC, 2013), might be used, but they suffer 
from the same drawbacks as any MCMC techniques. 

Recently, an alternative to MCMC has been proposed to estimate posterior 
marginals by integrated nested Laplace approximations (INLA) for the class of 
latent Gaussian models (Rue, Martino and Chopin, 2009). INLA provides accu- 
rate approximations avoiding time-consuming sampling. Due to its flexibility in 
the choice of likelihood functions and latent models, INLA is an appealing al- 
ternative to likelihood-based inference in particular for generalized linear mixed 
models (GLMMs) (Fong, Rue and Wakefield, 2010). The INLA approach is im- 
plemented in C and easy to use under Linux, Windows and Macintosh via a freely 
available R-interface (R Core Team, 2012). The R-package r-inla can be down- 
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loaded from www.r-inla.org. It allows to easily specify models in a modular way, 
where different types of regression models can be combined with different types 
of error models. Moreover, it is straightforward to incorporate random effects, 
such as independent or conditional autoregressive (CAR) models to account for 
spatial structure, which is of importance in several settings (Bernardinelli et al., 
1997). Here, we used the r-inla version updated on Feb 10, 2013. 

In this paper we extend the INLA framework to the most common Gaussian 
ME models, namely the classical and the Berkson ME models, which are suitable 
for continuous error-prone covariates. To facilitate the usage of the INLA-package 
with the new features, R-code is provided in the Supplementary Material. We 
hope that the solution presented here will increase the use of ME thinking in 
practice and stimulates the greater use of Bayesian methods in ME modeling. 

Section 2 introduces three applications from the biological/medical field con- 
taining: a linear regression combined with heteroscedastic classical error, a logistic 
model with an exposure model suffering from classical error, and an overdispersed 
Poisson regression model with Berkson error. In Section 3 we will review the clas- 
sical and Berkson ME models and their effects. Bayesian analysis with INLA is 
introduced in Section 4, where we will describe how to use this framework for 
model inference in the presence of classical and Berkson ME. Section 5 presents 
modeling details and the results of the three applications analyzed with both 
INLA and MCMC. Finally, we provide a discussion and outlook in Section 6. 

2. EXAMPLES OF MEASUREMENT ERROR PROBLEMS 

2.1 Inbreeding in Swiss ibex populations 

We analyzed data described by Bozzuto et al. (2013) on 26 Alpine ibex pop- 
ulations in Switzerland, some of them monitored over the last 100 years. The 
study aimed to quantify the effect of inbreeding on populations' intrinsic growth 
rates y. The intrinsic growth rate is the theoretical maximal rate of increase of a 
population, if there are no density-dependent effects. The inbreeding coefficient 
Xi of population i (often denoted as /) is a quantity between and 1, with larger 
values indicating that the population is stronger inbred. This quantity has been 
estimated from genotype analyses at 37 neutral microsatellite loci. To this end, 
a Bayesian analysis was employed to derive the estimates Wi and error variances 
for each population i. If a linear regression model y = (3q + fix x + £ is fitted with 
w instead of x, the absolute value of the slope parameter \(3 X \ is underestimated 
X = -2.59, 95% CI: [-5.44,0.25]). Accounting for ME in w, the effect of in- 
breeding on population growth dynamics is more pronounced (f3 x = —3.03, 95% 
CI: [-6.18,-0.08]). 

2.2 Influence of systolic blood pressure on coronary heart disease 

The Framingham heart study is a large cohort study carried out between 1973 
and 1986, and aimed to understand the factors leading to coronary heart dis- 
ease (Kannel, Neaton and Wentworth, 1986). The data used here were originally 
presented in MacMahon et al. (1990). The main predictor in this logistic regres- 
sion model is systolic blood pressure (SBP). As in Carroll et al. (2006, Section 
9.10), we analyzed data of re = 641 males considering smoking status z and 
x = log(SBP — 50) as predictors. The transformation of SBP was originally pro- 
posed in Cornfield (1962), and used by Carroll et al. (1984), Carroll et al. (1996), 
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and in Carroll et al. (2006). Since it is impossible to measure the long-term SBP, 
measurements at single clinical visits had to be used as a proxy. Note that, due 
to daily variations or deviations in the measurement instrument, the single- visit 
measures might considerably differ from the long-term blood pressure (Carroll 
et al., 2006). Hence, the ME in SBP has been a concern for many years in this 
study. Importantly, the magnitude of the error could be estimated, as SBP had 
been measured twice at different examinations (wi, W2). A naive approach ig- 
noring ME would fit a logistic regression against the indicator of coronary heart 
disease y G {0, l} n 

logit [P(y = 1 1 x, z)] = (3 + p x x + @ z z , 

where the true covariate x is replaced by the average of the two measurements 
w = (w\ + W2)/2 . The slope f3 x is attenuated in this naive regression (/3 X = 1.66, 
95% CI: [0.69,2.63]) compared to the estimate obtained with error modeling 
X = 1.93, 95% CI: [0.76,3.05]). 

2.3 Seedling growth across different light conditions 

In a shadehouse experiment, described in Paine et al. (2012), the impact of 
shading (dark, middle, light) and defoliation (0, 25%, 50%, 75% reduction of 
leaf surface) on plant growth in the Malaysian rainforest was investigated. The 
number of new leaves per plant after a four months growth phase was counted 
and used as the response variable for plant growth. Here, we analyzed 60 seedlings 
from the species Shorea fallax, from which 20 individuals were grown each under 
dark, middle, and light shading conditions. There were five shadehouses for each 
of the three shading conditions, and each shadehouse contained four individuals. 
Each seedling in a shadehouse was exposed to a different degree of defoliation 
treatment, compare Figure 1. In experimental studies in ecology, it is common 
practice that the value for the target light intensity is assigned to all replicates 
within a treatment class (i.e., dark, middle, light). However, it is obvious that due 
to external conditions the actual observed light availability might considerably 
vary from the target value within replicates. 

In this application we can quantify the introduced error, as the actual ex- 
perimentally created light availability was measured for each replicate. Let the 
covariate x = log(%light) denote the correct light intensities, and w the re- 
spective target values. The selected regression model is Poisson and includes an 
unstructured random term to account for potential overdispersion. In contrast to 
the above examples 2.1 and 2.2, where the bias induced by including w instead of 
x in the regression attenuates the parameter estimates, the respective regression 
coefficient is overestimated here. We will illustrate in Section 5.3 how this bias 
becomes larger when the error variance is (artificially) increased. 

3. MODELS FOR MEASUREMENT ERROR IN REGRESSION 

3.1 The generalized linear model 

Assume we have n observations in a generalized linear model (GLM). The 
data are given as (y,z,x), with y = (yi, ■ ■ ■ ,y n ) T denoting the response, z = 
(z\, . . . , z n ) T a covariate matrix of dimension n x p for p error-free covariates, 
and x = (xi, . . . , x n ) T a single error-prone covariate whose true values are unob- 
servable. The generalization to multiple error-prone covariates is straightforward. 
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Fig 1 . Illustration of the shadehouse experiment. There were five shadehouses per light condition 
and each shadehouse contained four seedlings. All seedlings in a shadehouse were exposed to a 
different defoliation treatment, 0% indicating that the leaves were not cut, 25% that one fourth 
of each leaf was cut, etc. 

Suppose y is of exponential family form with mean \n = E(yi \ x, z, (3), linked to 
the linear predictor rji via 

(la) ^ = h(rji) 

(lb) r/i = P + PxXi + (3] Zi . 

Here, h{-) is a known monotonic inverse link (or response) function, j3o denotes the 
intercept, (3 X the fixed effect for the error- prone covariate x, and Zj is p x 1 with 
a corresponding vector f3 z of fixed effects. This GLM is extended to a generalized 
linear mixed model (GLMM) by adding normally distributed random effects on 
the linear predictor scale (lb). 

Let w = (wi, . . . , w n ) T denote the observed version of the true, but unobserv- 
able covariate x. We distinguish between two different ME processes: the classical 
error model and the Berkson error model (Berkson, 1950). The graphical struc- 
ture of these models is very similar, compare Figure 2, but the effects caused by 
these models are fundamentally different. 

3.2 Classical measurement error model 

In the classical error model it is assumed that the covariate x can be observed 
only via a proxy w, such that, in vector notation, 

W = X + u , 

with u = (ui, . . . , u n ) T . Classical ME is unbiased and throughout the paper the 
components of the error vector u are assumed to be independent and normally 
distributed with mean zero and variance t" 1 , i.e. Cov(uj,Uj) = for % ^ j. Note 
that in the following we parameterize the normal distribution with mean and 
precision (or precision matrix in the multivariate context), rather than using the 
variance or covariance matrix. Thus we write u ~ A/"(0, t u T). 

We assume that the error term u is independent of the true covariate x, but 
also independent of any other covariates z and the response y. This implies a 
non-differential ME, meaning that y and w are conditionally independent given 
z and x. In most applications this assumption is plausible as it implies that, 
given the true covariate x and covariates z, no additional information about the 
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Fig 2. Graphical structure of the error models. Rectangular boxes illustrate variables that are 
observed, while circles indicate unknown variables. Black arrows correspond to the classical, and 
the dashed grey arrows to the Berkson error model. The fundamental difference is the change in 
direction of the arrow between x and w. 



response variable y is gained through w (Carroll et al., 2006). Ideally, repeated 
measurements of the true value available, so that 

(2) Wij | Xi ~ M(xi,T u ) , 

where Wij is the j measurement of Xj. More generally, the error structure can 
be heteroscedastic with w.j ~ J\f(x,T u ~D), where w.j denotes the vector of the 
j th measurements, and the entries in the diagonal matrix D represent weights di 
that are proportional to the individual error precision r Ui := T u (xi) depending 
on Xi. In fact, both the homo- and heteroscedastic cases are relevant in practice 
(see, e.g., Subar et al. (2001) or example 5.1 presented here). 

Parameter estimates belonging to x are usually attenuated in the classical 
ME setting. Consider for instance a simple linear regression with homoscedastic 
ME. Fitting the naive model y = (3$ + (3*w + e* instead of the true model 
V = fio + fi x x + e will result in < \f3 x \, if the error variance 1/t u is larger than 
zero. We therefore call (3* the naive estimate. The left panel of Figure 3 illustrates 
this attenuation affect. Another important effect is the significant increase of the 
variability around the regression line. 

3.3 Berkson ME model 

Berkson-type error can be observed in experimental settings, where the value 
of a covariate may correspond to, e.g. a predefined fixed dose, temperature or 
time interval, but the true values x may deviate from these planned values w due 
to imprecision in the realization. The second setting where Berkson-type error 
occurs is in epidemiological or biological studies, where, e.g., averages of exposures 
in areas are assigned to individuals living or working close-by. Examples are the 
application of fixed doses of herbicides in bioassay experiments (Rudemo, Ruppert 
and Streibig, 1989) or the radiation epidemiology study described in Kerber et al. 
(1993) and Simon et al. (1995). Such circumstances led to the Berkson error model 
(Berkson, 1950) 

X = w + u , 
where u and w are independent, and 



(3) 



x | w ~ Af(w, r M D) , 
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Covariate (x or w) Covariate (x or w) 

Fig 3. Effect of ME in the linear model. Left: Classical ME. Two effects can be seen: 1) The 
absolute value of the covariate estimate is biased (attenuated); 2) The variability around the 
regression line in the data with ME (black circles) is much larger than in the case of the truly 
observed data (grey squares). Right: Berkson ME. The absolute value of the covariate estimate 
is unbiased in the linear model, while the variability around the regression line is larger for the 
data with ME. 



with D denoting a diagonal matrix as in Section 3.2. Like classical ME, the 
Berkson error is also assumed to be non-differential. The effect of Berkson error 
is fundamentally different from that of classical error. In the linear regression 
model there is no attenuation effect, as illustrated in the right panel of Figure 3. 
However, the residual precision suffers from the same qualitative bias as in the 
classical ME model. Issues become more involved for GL(M)Ms. For instance, 
parameter estimates for logistic regression are only approximately consistent in 
the Berkson case (Burr, 1988; Bateson and Wright, 2010), which makes error 
modeling essential. 

The fundamental difference between the classical and Berkson error models 
implies different relationships between the error variances. Denote with r^T 1 and 
r^ 1 the variances of x and w, respectively. Due to the independence assumption 
of x and u in the classical and between w and u in the Berkson error case, the 
variances in the two ME models can be written as 

= r" 1 + t" 1 (classical) , 
t' 1 = t' 1 +t~ x (Berkson) . 

Thus, the surrogate w is more variable than the true covariate x in the classical 
model, whereas the opposite is true in the Berkson case. This effect can also be 
observed in Figure 3. 

4. MEASUREMENT ERROR MODELING WITH INLA 

Bayesian analyses of ME problems date back to the seminal work of Clayton 
(1992) and are usually based on hierarchical models. Bayesian hierarchical models 
often require the use of MCMC algorithms, which typically are time-consuming 
and require diagnostic checks to ensure good mixing properties and convergence 



8 



S. MUFF FT AL. 



of the simulated samples. These might be some of the reasons why Bayesian ME 
analysis is not widely used in practice. 

4.1 Inference for latent Gaussian models based on deterministic 
approximations 

Rue, Martino and Chopin (2009) proposed with INLA a new approach based 
on accurate approximations to perform fast Bayesian inference in a sub-class of 
hierarchical models, namely latent Gaussian models. The observation variable yi 
is assumed to belong to a distribution family where the mean m is linked to the 
linear predictor r\i via a link function so that r/j = The likelihood 

can be controlled by hyperparameters 0\, e.g., in the Gaussian case the residual 
variance. To account for the effect of different covariates, the linear predictor is 
additively composed: 

nj 

(4) m = Po + u af U) +Pjzi + £i- 

As in Equation (lb), (3 Z represents the vector of fixed effects of covariates z. 
The {/^'}s are unknown functions of the covariates u, where Uij denote known 
weights defined for each observed data point. The SiS are additional unstructured 
terms. 

A latent Gaussian model is obtained by assigning to v = (rj, (3q, f,P z ) a Gaus- 
sian prior with precision matrix Q(#2) 5 controlled by hyperparameters #2- Of 
note, v is reparameterized to include 77 instead of e. The reason is that INLA 
(originally) requires that each data point t/i is linked to a single component of the 
latent field, namely rji (Martino and Rue, 2010). Hence, a small random noise, e^, 
with high precision is always automatically added to the model. Finally, we need 
to specify a prior distribution on 6 = (6j , #J). 

Assuming that m, i = 1, . . . , n, are conditionally independent given v and 0, 
the posterior distribution is 

p(v, 0\y)(x p(0) p(v I 6) ]J p(yi I Vi, 9) , 

i 

where p(-|-) denotes the conditional density of its arguments. The posterior marginal 
distributions of interest are 

(5) v(vi\y) = I p(v i \0,y) J >(0\y)dG , 

Je 

(6) V (6 j I y) = f p(0| y) dB-i . 

Except for cases when everything can be computed analytically, exact inference 
is very challenging. Hence, sampling-based approaches have been the standard 
tool (Gelfand and Smith, 1990). Currently, few generic software packages based 
on MCMC, e.g. OpenBugs (Lunn et al., 2009), are available. 

INLA avoids sampling and instead provides accurate approximations to (5) and 
(6). First, it builds a Laplace approximation to p(6 \ y), from which a representa- 
tive set of support points {Ok} is selected. For each 6^ a Laplace approximation 
(or its simplified version) is built to p(vi \ 9}~,y). Finally, to obtain approximations 
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of p(vi | y), the integral in Equation (5) is numerically computed as a finite sum. 
Numerical approximations of p(6j \ y) are obtained similarly. 

As discussed in Rue, Martino and Chopin (2009) and illustrated in a variety 
of different applications, the approximation error of INLA is small compared to 
the Monte Carlo error and negligible in practice, see for example Paul et al. 
(2010); Schrodle et al. (2011); Riebler, Held and Rue (2012). For details on the 
approximations we refer to Rue, Martino and Chopin (2009). 

4.2 Classical ME model inference 

Consider a generalized linear (mixed) model regressing a response y on covari- 
ates x and z. The covariate z can be observed directly, while instead of x only 
a surrogate w\x, 6 ~ M(x,t u T)), following the classical error model (2), is avail- 
able. This model is not covered by the general INLA model set (4). The problem 
is that it includes a product structure of two unknown components, namely f3 x 
and x. Here, j3 x is no longer a fixed effect, but plays the role of an unknown 
scaling factor (hyperparameter) for the latent process x. 

To fit within the scope of INLA, x must follow a Gaussian model. As the 
intercept /3q, covariates z and the unstructured terms E{ do not influence the 
calculations, let us consider a simplified model, where only the product term of 
interest remains: 

(7) V = PxX , 

(8) w = x + u , 

x = ao + [i , 

with u ~ M(0,t u B) and fi ~ A/"(0, t x I). Further, let a ~ AA(0, 10~ 4 ), (3 X ~ 
AA(0, 10~ 4 ), t x ~ G(a x ,b x ), and t u ~ G(a u ,b u ). Other priors for the precisions 
t x and t u could also be used in INLA, see Roos and Held (2011) for an example. 
Set 6 = (P x ,T x ,T u ,ao). The latent field in this simplified model contains only x. 
Hence, the posterior distribution of x and 6 is 

p(x,6\y,w) cc p(0) p(x\0) p(w\x,6) p(y\x,0) . 

Using p(x | 6) p(w | x, 9) = p(x \ w, 6) p(w \ 6), we get 

(9) p(x,0\y, w) oc p(0) p(x\w,0) p(w \ 9) p(y\x,9) , 
where 

w | 6 ~ M (a l, [KD)- 1 + (t x I)- l1 



Since x only enters in one term in (9) (apart from the likelihood), it can be 
treated as an ordinary latent model, where x\w,0 is Gaussian: 

p(x\w,6) oc p(x | 9) p(w | x, 9) 

oc exp ^— ^ (x — aol) T (x — aol) — ~^( x ~ w) T D(x — w) 

Combining the quadratic forms gives 

x | w, 9 ~ N ( (T x a l + TuDw){t x I + r u D) _1 , r x \ + r M D 
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To arrive at a more convenient model formulation, we apply a change of vari- 
ables and set 

P x x -)• v , 

so that r\ = v. Following the same steps as above but replacing x with v, we can 
derive a latent model for v\w, 6: 

v\w,6 ~ N (p x (T x a l + T u T>w)(T x I + T u T>)-\ Trl+ " 



PI 

This model is termed "mec" within the R-package r-inla and has four hyperpa- 
rameters: /3 X , t x , t u , «o- 

The mec model could be extended to a more complex setting, where the co- 
variate x is regarded as a random variable depending on other covariates, leading 
to a structural model with prior distribution 

(10) x | z ~ N(a + za z ,T x T) . 

Here, ao is the intercept, a z is the p x 1 vector of fixed effects, and r^ 1 the 
residual variance in the linear regression of x on z. Equation (10) is the expo- 
sure model and it is often used in epidemiological studies (Gustafson, 2004). The 
assumption that the distribution of the unobserved x given the observable covari- 
ates z follows a normal distribution is crucial to apply INLA, but often justified. 
Due to recent extensions of INLA, see Martins and Rue (2012), which allow that 
independent components of the latent field can have a non-Gaussian distribution, 
this assumption might be relaxed in the future. 

For exposure models including many components, the model specification with 
mec becomes complex. Recently, a new copy feature was added to the INLA 
package to treat situations where elements of a latent field x are needed more than 
once for each observation, consider for example bivariate normally distributed 
random effects (Martins et al., 2012). An almost identical copy denoted by x* for 
x is then created and the latent field is extended to x c = (x,x*) with tt(x c ) = 
p(x) p(x* | x) and 

p(x* | x, if>, t) oc exp (^~7}( x * ~ ipx) T (x* — iftx) 

The precision r, fixed to some large value, controls the similarity between x* and 
x (default value: 10 9 ). A copied model may contain an unknown scale parameter 
ip, which per default is fixed to one. Our framework can be considered as such 
a copy setting, where x appears in different levels of the model (compare the 
simplified model (7), (8), (10)), instead of repeated times for one observation. 
The parameter f3 x from (7) can thereby be regarded as the unknown scaling 
factor -0 of the copied model. ME models with general exposure structures as in 

(10) are hence straightforward to analyze with INLA, compare application 5.2 
and the corresponding Supplementary Material. 

The general model structure including (lb), (2) and (10) is hierarchical and 
can be written as 

E(y) = h(p + PxX + z(3 z ) , 

(11) = -x + a + za z + e x , e x ~ JV(0, r x T) , 
w = x + u, u ~ A/"(0, t u D) . 
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Note that the exposure model (10), encoded in (11), can be easily extended 
to include structured or unstructured random effects terms. Implementation in 
INLA requires a joint model formulation, where the column of the actual response 
y is merged with a column of pseudo-observations 0, compare (11), to account for 
the exposure model (see also Schrodle, Held and Rue (2012) and Ruiz- Cardenas, 
Krainski and Rue (2012)), and a column of the observed values w to model the 
error. The resulting INLA response matrix contains one separate column per 
equation, namely 

~yi NA NA " 

y n NA NA 
NA NA 

NA NA 
NA NA Wi, 

NA NA w n . 

Here, Wi. denotes the mean from the repeated measurements Wij of Xi, which 
is commonly used as proxy for the latter. Each observation column requires the 
assignment of a different likelihood function. The latter two are always assumed 
to be Gaussian (or near-Gaussian), while the first follows the specified exponential 
family distribution. In most software packages the assignment of various likelihood 
functions is not straightforward, if possible at all. In INLA the only requirement 
is that the response observations, given the latent field and the hyperparameters, 
are conditionally independent. 

4.3 Berkson ME model inference 

Consider again a generalized linear (mixed) model regression (lb), but replace 
the classical error model (2) by the Berkson model (3) 

x | w, 6 ~ Af(w, t u D) . 

Since x is defined conditionally on the observations w, the exposure model (10) 
becomes obsolete. The same simplifications as in (7) lead to the hierarchical model 

V = P x x , 

(12) x = w + u , 

with u ~ A/"(0, t u D), f3 x ~ A/"(0, 10~ 4 ) and t u ~ G(a u , b u ). The hyperparameters 
are = (f3 x , r u ). Importantly, the latent model x \ w, is now identical to the error 
model (3), because the latent field contains only x. It is thus straightforward to 
calculate the posterior distribution 

p(x,6\y,w) oc p(0) p(x\w,0) p(y\x,0) . 

The same change of variables j3 x x — > v as in the classical case is useful again, 
and leads to 

v\w,0 ~ ff(p x w, ^|d) . 
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This model is termed "meb" within the R-package r-inla and has two hyperpa- 
rameters: j3 x and r u . 

As in the classical case, the copy feature might be used alternatively, see Sec- 
tion 4.2. However, here it does not add to the generality of the model specifica- 
tion as no exposure model is involved. Thus, we recommend the use of the more 
straightforward model definition using the meb model, and just illustrate for com- 
pleteness the formulation using the copy feature. Since the respective joint model 
contains only the two components 



E(y) = 

— w = 

the response matrix simplifies to 



h(p + p x x + z(3 z 
— X + u . 



VI NA 



Un 

NA 



NA 



NA 

— W\ 

-w n 



8j) T are the 



The latent field is given by i; = (x , f3 z ) , and = (f3 x ,r u ,u 1 
hyperparameters, where 6\ may again contain additional hyperparameters of the 
likelihood. All components in v and the coefficient f3 x obtain Gaussian priors, 
and the error precision t u a suitable gamma prior. 

5. APPLICATIONS 

In the following, we demonstrate how to define the different measurement error 
applications introduced in Section 2 in the INLA framework. The respective r- 
inla code is given in the Supplementary Material. A comparison of the results 
obtained by INLA to those obtained by an independent MCMC implementation 
is provided for each application to highlight the accuracy of INLA. 

Note that the efficiency of sampling-based techniques, such as MCMC, is gen- 
erally improved when the covariates are centered around zero (Gelfand, Sahu and 
Carlin, 1995, 1996). The same is true for INLA. Here, covariates are sometimes 
not centered to compare to the results of a reference example. Additional adjust- 
ments of the default parameters in the numerical optimization routine of INLA 
might be needed then, see Supplementary Material. 

5.1 Inbreeding in Swiss ibex populations 

The ibex data introduced in Section 2.1 were analyzed using a linear model 
with classical heteroscedastic error variances. The observation model is a simple 
Gaussian response model 

y\x ~ Af(Po + p x x,T e T) 

with y being the intrinsic growth rates and x the inbreeding coefficients of the 
populations. The level of inbreeding Xi in population i = 1,...,26 was esti- 
mated as Wi from a Bayesian analysis, which, as a by-product, also provided an 
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Fig 4. Uncertainty in the covariate x in the ibex study, depending on the estimate w. Larger 
values can be estimated with less precision (i.e., larger variance l/r u ). 

estimated population-specific error precision f Ui . Larger values of w have more 
uncertainty, i.e., smaller precision, as shown in Figure 4. It is natural to formulate 
a heteroscedastic classical error model 

w | x ~ N(x, T U D) 

with entries t u . in the diagonal matrix D. Since x is the only covariate in the 
model, the exposure model (10) reduces to 

x ~ J\f( ao l,T x I) . 

Here, Qo = was fixed due to the preceding covariate centering. The unknowns 
in this example are the latent field v = (x T , /?o) T and the hyperparameters 
•,' T ui T xi T e) • We assigned independent A/^O, 10 ) priors to Pq and j3 x . 
All precisions were given gamma priors with mode fixed at some estimated value. 
This led to t u ~ G(2, 1) with mode equal to one, t £ ~ G(10, 0.387) with mode 
at half of the residual precision of the regression y ~ w, and r x ~ G(10, 0.0117) 
with mode at f x = [l/f w — l/f u ] , where f w is the sampling precision of w, and 
t u is the mean of all estimates f Ui . The use of informative priors is essential here, 
because the model is not identifiable if the error precision t u is not restricted. 

An MCMC simulation was run for lOO'OOO iterations with a burn-in of lO'OOO 
iterations and a saving frequency of 10. The estimates obtained from INLA were 
chosen as starting values. Convergence was visually checked. Figure 5 shows the 
perfect fit between the MCMC samples and the posterior marginals of INLA. 
Of note, due to the Gaussian likelihood the results obtained by INLA are exact 
and contain no approximation error. The parameter estimates are graphically 
compared to the naive analysis, including w instead of x, in Figure 6. The absolute 
value of the slope \[3 X \ and the residual precision t £ are underestimated in the 
naive regression, as predicted by the theory. For further comparison, the results 
of an analysis with SIMEX are given, suggesting a slightly stronger correction 
for the slope {3 X . SIMEX was run as described in Cook and Stefanski (1994) and 
Devanarayan and Stefanski (2002), using a quadratic extrapolation function and 
a bootstrap of lO'OOO iterations to estimate the quantile confidence intervals. 
There also exists an R-package simex (Lederer and Kuchenhoff, 2009), but it did 
not allow for heteroscedastic error variances at the time when this analysis was 
carried out. 
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Fig 5. Comparison of the MCMC samples (histograms) with the INLA posterior marginals 
(lines) for the ibex data. 



As was mentioned in Section 2.1 and is obvious from Figure 6, the difference 
between the naive and the corrected estimate of f3 x is not extreme (about 17%). 
Yet, in this example the difference is just enough to remove from the confidence 
interval. 

5.2 Influence of systolic blood pressure on coronary heart disease 

Since SBP has been measured twice at different examinations, the magnitude of 
the measurement error can be quantified. Here, we assume that the average values 
w between w\ = log(SBPi — 50) at first examination and W2 = log(SBP2 — 50) 
at second examination are independent and normally distributed with mean x 
and precision r u , leading to the classical homoscedastic error model 



W | X ~ J\f(X, Tyl) 
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Fig 6. Parameter estimates and 95% confidence/ credible intervals in the ibex data analysis. The 
dashed lines indicate the naive estimates. Only parameters where a naive estimate was available 
are included here. 
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The observation model is logistic, incorporating smoking status z and long- 
term blood pressure x as covariates: 

T) = logit [P(y = 1 1 x, z)) = fi + (3 x x + /3 z z . 

Finally, the exposure model (10) comes in its most general form 

x | z ~ J\f(aa + a z z, t x T) . 

The unknowns in this model are v = (x T , fio, fi z , ao, a z ) T and 6 = (fi x ,r u , T x ) . 
The a- and /3-parameters were given independent jV(0, 10~ 4 ) priors. The prior 
for t u was selected by using the repeated measurements w\ and w 2 as follows: 
starting from an improper reference gamma prior G(ao,6o) = G(0,0) for r u , the 
parameters of the final gamma prior G(a u , b u ) are deduced as 

p(t u \w 1 ,w 2 ) oc p(w 1 ,w 2 \t u ) ■ p(r u ) 

oc C" 1+n • exp ^ ^[(wn - w t ) 2 + (w l2 - w l .) 2 ] - r u ■ 6 ) 

(13) ~ G^^hi-ffijf + ^-ffii) 2 ]), 

i 

where uJj, = (wn + Wi 2 )/2 is the entry of the vector w. Note that (13) is the 
prior precision for w (not for the repeated measurements w± and w 2 ). Following 
Example 5.1, t x was given a gamma prior G(100, 4.02) with mode at the estimated 

r I -1 

value f x = 1/t w \ z — l/f u , where f w \ z is the residual precision of w given z. 
Again, the use of informative priors is essential to avoid identifiability problems. 
The prior specifications might deviate from the reference example in Carroll et al. 
(2006), where the exact parameters were not given, but it is important to use the 
knowledge from repeated measurements and observed values in order to obtain 
sensible results. Furthermore, note that Carroll et al. (2006) used the quantity 
A := t x /t u instead of t u , and gave it a uniform prior in the interval (0, 0.5). Since 
this is not straightforward with INLA, the model was modified as described. 

To obtain Markov Chain Monte Carlo (MCMC) posterior marginals, regres- 
sion coefficients of GLMs cannot directly be sampled from a standard full condi- 
tional distribution. Here, samples were obtained according to Gamerman (1997). 
The algorithm can be used if the observations yi are conditionally indepen- 
dent and follow an exponential family density. For the regression coefficients 
(3 = (/3o, fi x , /3j) T , this approach uses transition densities that combine the 
weighted least squares method with a prior on j3 (McCullagh and Nelder, 1989; 
West, 1985). The full conditionals for the unknowns in our logistic regression 
model are given in the Supplementary Material. 

The simulation was run for 200'000 iterations with a burn-in of lO'OOO and ev- 
ery 10 th value was saved. Starting values for oc and (3 were chosen from the INLA 
output. For t u and t x , the initial estimates f u = nj Ya=i [( w i,i ~ Wi.) 2 + (wi j2 — wi 
and f x from above were used. The acceptance rates for (3 and x were 83% and 
69%, respectively. 

The agreement between the MCMC and INLA output is almost perfect. The 
corresponding Figure containing the MCMC histograms and the INLA posterior 
marginals is included in the Supplementary Material. Figure 7 shows parameter 
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Fig 7. Parameter estimates and 95% confidence/credible intervals for the Framingham data 
analysis. For MCMC and INLA, posterior means are used as point estimates. C.MCMC and 
C.ML stand for the Bayesian and the maximum likelihood analysis conducted in Carroll et al. 
(2006). The dashed lines indicate the naive estimates. 



estimates for the naive regression model including w, and for four error-correction 
approaches. Note that only parameters where the naive estimate was available 
were included. Carroll et al. (2006) analyzed the data with a maximum-likelihood 
method and a Bayesian approach using MCMC. The fourth and fifth rows show 
the results as obtained from MCMC and INLA with the setup described here. 
All error-corrected estimates and the confidence/credible intervals are similar. 

5.3 Seedling growth across different light conditions 

Let y denote the number of new leaves per plant after a four months growth 
phase. The covariate z denotes the degree of defoliation and x = log(%light) the 
light intensity, where w is the respective target value. Using w instead of x in 
the analysis leads to the homoscedastic Berkson error model 

X | W ~ N(W, T U I) . 

In the following we centered both covariates w and z. This data structure leads 
to a Poisson regression model with nested design. To account for overdispersion, 
independent normal random effects 7^ ~ AA(0, r 7 ) have been added, extending 
the GLM to a GLMM: 

(14) rjijk = log(E[y ijk \ x, z, (3, 7]) = /3 + (3 x Xij + f3 z z k + j ijk , 

with i = 1,2,3 denoting the light condition, j = 1, . . . , 5 the shadehouse per 
light condition, and k = 1, . . . , 4 the degrees of defoliation. The unknowns of this 
model are v = (x T , (3q, (3 z ) t and = ((3 x ,t u , t 7 ) t . 

Priors were chosen as (3 ~ AA(0, 10" 4 ), t u ~ G(100, 19.88) and r 7 ~ G(100, 1.712), 
where the mode for the t u prior was fixed at the estimated value f u = 14/ J2ij(xij — 
Wij) 2 = 4.98 from the individual shadehouse measurements x when compared to 
w, and the mode for the r 7 prior at the f 7 = 57.83 estimate from the regres- 
sion (14) including w instead of x. 

The results from the regression with INLA were compared to an MCMC run 
with 200'000 iterations and a burn-in of lO'OOO iterations. Sampling was based 
on a reparameterization as proposed by Besag et al. (1995), where all except one 
full conditional distribution are standard and can be Gibbs-sampled. Figure 8, 
left, shows the posterior marginal for the coefficient of interest, f3 x , as obtained 
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Fig 8. Posterior marginals for MCMC (histograms) and INLA (lines) for the seedling growth 
example. The straight solid lines indicate the correct estimates of the slope (3%, while dashed lines 
mark the naive estimates when w is included in the regression. (Left) Original data. (Right) 
Data with five times larger error variance. 

by MCMC (histograms) and INLA (line). The solid and dashed lines mark the 
estimates when x or w are included in the regression, respectively. As can be 
seen, the bias due to the Berkson error is relatively small (J3 X = 0.307 vs. 0.345). 
To illustrate the size of a potential error correction, we now assume that the error 
variance 1/t u was five times larger, i.e., the true values x were scattered broader 
around the target values w. To this end, error was added artificially to x. The 
analysis was, apart from that, repeated with the same data and the same model 
(including, of course, an adjusted prior for r u ). The right panel of Figure 8 illus- 
trates the much stronger bias induced by the increased Berkson error (f3 x = 0.241 
vs. 0.345), and shows that the model approximately recovers the correct value. 
Marginals for the other regression parameters are not shown, as no noticeable 
bias was present. 

Theory predicts that slope parameters in log-linear Poisson models do not suffer 
from a bias in the presence of Berkson ME (Carroll, 1989). The model used here 
nevertheless shows this bias (and it can be artificially enhanced), which might be 
due to small deviations from the ideal Poisson model assumptions and overdis- 
persion. 

6. DISCUSSION 

Measurement error in covariates may lead to serious biases in parameter es- 
timates and confidence intervals of statistical models, if it is not accounted for 
in the analysis. A variety of approaches to model such error have been proposed 
in the past decades, among which Bayesian methods probably provide the most 
flexible framework. Bayesian treatments, employing MCMC samplers, have been 
successfully applied for more than 20 years, but their application has never be- 
come part of standard regression analyses. 

The aim of this work was to illustrate how the most common ME models (clas- 
sical and Berkson error) can be included in GLMMs using the recently proposed 
INLA framework, which gives fast and accurate approximations instead of doing 
any sampling. The provided R-code should help to make such models accessible 



18 



S. MUFF FT AL. 



to a broader audience. Note that INLA includes a much larger variety of likeli- 
hood functions and latent models than we could illustrate here, and the modular 
structure adds to its flexibility. It is, for instance, straightforward to treat sev- 
eral mismeasured covariates jointly, to introduce a systematic bias into the error 
model, or to include any structured random term into the model formulation. 

One of the biggest challenges when treating mismeasured variables is the es- 
timation of the error variance, either from repeated measurements, instrumental 
variables or from previous studies. The advantage of a Bayesian approach, as the 
one taken here, is that uncertainty of such estimates can be incorporated into 
prior distributions, while frequentist approaches require to fix such parameters. 
Moreover, sensitivity to chosen prior assumptions can be easily checked due to 
the computational speed of INLA, see Roos and Held (2011). 

Gaussian classical and Berkson error models naturally fit into the INLA frame- 
work of latent Gaussian models, and thus the error-prone covariates used here are 
always continuous. The computational speed of INLA may allow to treat even 
misclassification error by following the misclassification SIMEX (MC-SIMEX) 
idea presented in Kiichenhoff, Mwalili and Lesaffre (2006). Here, INLA would be 
applied repeatedly to estimate posterior marginals for different degrees of misclas- 
sification and extrapolating back to no misclassification. Such ideas need further 
investigation and should be detailed in the future. 

SUPPLEMENTARY MATERIAL FOR "MEASUREMENT ERROR IN 

GLMMS WITH INLA" 

Due to space constraints, the R-code for all examples presented here is de- 
scribed in detail in the supplementary document. Furthermore, this document 
contains full conditionals and posterior marginals for Section 5.2. On www . r- inla . 
org/examples/case-studies/muf f-etal-2013 selected data and R-code are 
provided for download. 
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SUPPLEMENTARY MATERIAL 

7. R-CODE FOR THE THREE APPLICATIONS IN THE MAIN TEXT 

In this section we guide the reader through the r-inla code and technical 
details of the three examples discussed in the main text. On www.r-inla.org/ 
examples/case-studies/muff -etal-2013 selected data and R-code are pro- 
vided for download. The r-inla package can be installed by typing the following 
command line in the R terminal: 

source ( "http : //www. math . ntnu . no/inla/givemelNLA . R ") 
upgrade . inla (testing=TRUE) 

Using 

inla. version () 

information regarding the actual installed version is shown. Here, we used the r- 
inla version built on Feb 10 2013. For more information regarding the installation 
process we refer to www.r-inla.org. 

7.1 Inbreeding in Swiss ibex populations 

The object data consists of three columns: 

y w error. prec 

They contain (for n = 26): 

• y\ . . .y n : The populations' intrinsic growth rates. 

• w\ . . . w n : The estimated inbreeding coefficients. 

• error. precj . . . error. prec n : The error precisions in the estimates w. 

Start with the prior specification process, as described in the main text: 
attach (data) 

varYEst <- (summary (lm(y~w))$sigma~2) /2 
varXEst <- var(w) - mean (1/ error .prec) 
prec.y <- 1/varYEst 
prec.x <- 1/varXEst 
prec . u <- 1 

prior .prec .y <- c(10, 9/prec .y) # prior mode at prec.y 
prior .prec.x <- c(10, 9/prec .x) # prior mode at prec.x 
prior .prec .u <- c(2, 1) 
prior. beta <- c(0, 0.0001) 

Next, we define the INLA model formula, where the new mec model is employed. 
Note that the heteroscedasticity in the error is encoded by assigning the vector 
of error precisions error. prec to the scale option. The model contains four 
hyperparameters: 

• beta corresponds to (3 X , the slope coefficient of the error-prone covariate x, 
with a Gaussian prior. 

• prec.u is the error precision t u with gamma prior. 

• prec.x is the precision r x of x ~ N(ceol, t x I) with gamma prior. 
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• mean.x corresponds to the mean a®, which is fixed here at due to covariate 
centering. 

The prior settings are defined in the different entries of the list hyper. The option 
fixed specifies whether the corresponding quantity should be estimated or fixed 
at the initial value. The field par am captures the prior parameters of the corre- 
sponding prior distribution. Gaussian prior distributions are the default for beta 
and mean.x, while log-gamma distributions are used for the log-transformed pre- 
cisions prec.u and prec.x. Note hereby, that if a variable r is gamma distributed 
with shape parameter a and rate parameter b leading to the mean a/b and vari- 
ance a/b 2 , then log(r) is log-gamma distributed with the same parameters a and 
b. 

library (INLA) 

formula <- y f(w, model = "mec" , scale = error. prec, hyper = list( 
beta = list( 

param = prior .beta, 

fixed = FALSE 

), 

prec.u = list( 

param = prior .prec .u, 
initial = logCprec .u) , 
fixed = FALSE 
), 

prec.x = list( 

param = prior .prec.x, 
initial = logCprec .x) , 
fixed = FALSE 
), 

mean.x = list( 
initial = 0, 
fixed = TRUE 
) 

) 

) 

The call of the inla function includes the specifications for t £ , the hyperpa- 
rameter of the Gaussian regression model. These can be controlled via the con- 
trol .family option. The prior distribution for the intercept /?o is specified in the 
control . fixed option. 

r <- inla (formula, data = data.frame(y, w, error .prec) , 
family = "gaussian" , 
control . family = list( 
hyper = list( 

prec = list (param = prior .prec .y , 
initial = log(prec .y) , 
fixed = FALSE 

) 

) 

), 
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control . fixed = list( 

mean . intercept = prior. beta [1] , 
prec . intercept = prior. beta [2] 

) 

) 

r <- inla.hyperpar(r, dz = 0.5, diff.logdens = 20) 

The last command improves the estimates of the posterior marginals for the hy- 
perparameters of the model. The call is optional, but a slightly better agreement 
with the MCMC posterior marginals was found in this example. To get a quick 
overview of the results, use the summary command. 

summary (r) 



7.2 Influence of systolic blood pressure on coronary heart disease 

The object data consists of four columns: 

y W\ Wi z 

They contain (for n = 641): 

• Vi ■ ■ ■ Un- The binary response yi G {0, 1}. 

• w\i . . . w\ n : log(SBP — 50) at examination 1. 

• u>2i . . • W2n' log(SBP — 50) at examination 2. 

• z\ . . . z n : Smoking status z\ G {0, 1}. 

As described in the main text, the hierarchical model of this example is formulated 
in INLA as a joint model by applying the copy feature. The full model can be 
written as 
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The reader is guided through the r-inla code for this joint model formulation 
in the following. The terms below the brackets indicate the names as they will be 
employed in the code. Start with the prior specification process, as described in 
the main text: 



attach (data) 

w <- rowMeans (cbind(wl , w2)) #the mean of wl and w2 
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n <- nrow(data) #641 
Ntrials <- rep(l, n) 

varUEst <- sum((wl - w)"2 + (w2 - w)"2)/n 
varXEst <- summary (lm(w~z))$sigma~2 - varUEst 
prec . u <- 1/varUEst 
prec.x <- 1/varXEst 

prior. prec. u <- c(n, l/2*sum((wl - w)~2 + (w2 - w)~2)) 
prior .prec.x <- c(100, 99/prec.x) # prior mode at prec.x 
prior. beta <- c(0, 0.0001) 

Second, the response matrix Y and the data vectors are filled according to the 
naming of the above joint model equation: 

Y <- matrix (NA, 3*n, 3) 
Y[l:n, 1] <- y 
Y[n+(l:n), 2] <- rep(0, n) 
Y[2*n+(l:n), 3] <- w 

beta.O <- c(rep(l, n) , rep(NA, n) , rep(NA, n)) 

beta.x <- c(l:n, rep(NA, n) , rep(NA, n)) 

idx.x <- c(rep(NA, n) , l:n, l:n) 

weight, x <- c(rep(l, n) , rep(-l, n) , rep(l, n)) 

beta.z <- c(z, rep(NA, n) , rep(NA, n)) 

alpha .0 <- c (rep (NA , n) , rep (1, n) , rep (NA , n) ) 

alpha, z <- c(rep(NA, n) , z, rep(NA, n)) 

data. joint <- data. frame (Y, beta.O, beta.x, idx.x, weight. x, 
beta.z, alpha. 0, alpha. z) 

The next step contains the definition of the INLA formula. There are four 
fixed effects (/3q, (3 z , ao an d ct z ) and two random effects. The latter are needed to 
encode for the fact that the values of x in the exposure (7) and error model (8) 
are assigned the same values as in the regression model (6), where /3 x x represents 
a product of two unknown quantities. The two random effects terms are: 

• f (beta . x , . . . ) : The copy=" idx . x" call guarantees the assignement of iden- 
tical values to x in all components of the joint model. As discussed in the 
main text, /3 X is treated as a hyperparameter, namely the scaling parame- 
ter of the copied process x*. Note that the initial value of 1.91 given in the 
following corresponds to the naive estimate of j3 x . 

• f (idx.x, . . . ) : idx.x contains the x values, encoded as an i.i.d. Gaussian 
random effect, and weighted with weight.x to ensure correct signs in the 
joint model. The values option contains the vector of all values assumed by 
the covariate for which the effect is estimated. It must be a numeric vector, 
a vector of factors or NULL. The precision prec of the random effect is 
fixed at r = exp(— 15). 

library (INLA) 

formula <- Y f (beta.x, copy = "idx.x", 

hyper = list (beta = list (initial = 1.91, 
param = prior .beta, fixed = FALSE))) + 
f(idx.x, weight.x, model = "iid", values = l:n, 
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hyper = list(prec = list (initial = -15, fixed = TRUE))) + 
beta.O - 1 + beta.z + alpha. + alpha. z 

Since there is no common intercept in the joint model, it has to be explicitly 
removed using -1. The call of the inla function is given next. The following 
options need some explanation: 

• family: There are three different likelihoods here, namely the binomial 
likelihood of the regression model and two Gaussian likelihoods, one for 
the exposure and one for the error model. They correspond to the different 
columns in the response matrix Y. 

• control . family: Specification of the hyper parameters for the three likeli- 
hoods, in the same order as given in family. The binomial likelihood does 
not contain any hyper parameters, thus the respective list is empty. In the 
second and third likelihood the hyperparameters t x and t u need to be spec- 
ified. 

• control . fixed: Prior specification for the fixed effects. 

• control . inla: Specification of internal parameters used in the optimiza- 
tion routine of INLA. This is optional and not necessary if covariates were 
centered. For details of the options consult ?control . inla. 

r <- inla (formula, Ntrials = Ntrials, data = data. joint, 
family = c ("binomial" , "gaussian" , "gaussian") , 
control . family = list( 

list (hyper = list()), 
list (hyper = list( 

prec = list (initial = log(prec . x) , 
param = prior .prec. x, 
fixed = FALSE))), 
list (hyper = list( 

prec = list(initial=log(prec.u) , 
param = prior .prec .u, 
fixed = FALSE)))), 
control . fixed = list( 

mean . intercept = prior .beta [1] , 
prec . intercept = prior .beta [2] , 
mean = prior. beta[l] , 
prec = prior .beta [2] ) , 
control. inla = list( 
h = le-5, 
tolerance = le-6, 
int . strategy = "grid" , 
dz = 0.2) 

) 

summary (r) 
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7.3 Seedling growth accross different light conditions 

Analysis with the meb model 

The object data consists of the three columns: 

y w z 

They contain (for n = 60): 

• Hi ■ ■ ■ Vn- The number of new leaves. 

• w± . . . w n : log(%light) for the target light intensities under dark, middle and 
light conditions (i.e., only three different values). Here, the centered values 
are used. 

• z\ . . . z n : Degree of defoliation (0%, 25%, 50%, 75%, respectively the cen- 
tered values). 

Let us start again with prior specification process in accordance to the main 
text: 

attach (data) 

n <- 60 # number of seedlings 

s <- 15 # number of shadehouses 

w <- w + rep(rnorm(s,0,le-4) ,each=n/s) 

individual <- l:n # id to incorporate individual random effects 

prec.u <- 4.98 # estimated value 

prec. tau <- 57.83 # estimated value 
prior. beta <- c(0, 0.0001) 

prior .prec .u <- c(100 ,99/prec .u) # prior mode at prec.u 
prior .tau <- c (1 00, 99/prec. tau) # prior mode at prec. tau 

The fourth line contains a trick to ensure that the light values w from the s = 15 
shadehouses are not completely identical, because in the new meb model only the 
unique values of w are used. Thus, if two or more elements of w are identical, then 
they refer to the same element in the covariate x. Next, we define the meb model 
formula. The model contains two hyperparameters: 

• beta corresponds to (3 X , the slope coefficient of the error-prone covariate x, 
with a Gaussian prior. 

• prec.u is the error precision t u with gamma prior. 

The prior settings are defined in the different entries of the list hyper. The 
option fixed specifies whether the corresponding quantity should be estimated 
or fixed at the initial value. The field param captures the prior parameters of 
the corresponding prior distribution. A Gaussian prior distribution is the default 
for beta, while a gamma distribution is used for prec.u (again defined as log- 
gamma distribution for the log-precision). 

The model contains as additional fixed effect the degree of defoliation z, plus 
an additional i.i.d. random effects term per individual to account for unspecified 
heterogeneity, specified in f (individual ,...) , which extends the GLM to a 
GLMM: 
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library (INLA) 

formula <- y f(w, model="meb" , hyper = list( 
beta = list( 

param = prior .beta, 
fixed = FALSE 

), 

prec.u = list( 

param = prior .prec .u, 
initial = log(prec.u) , 
fixed = FALSE 

) 

)) + 
z + 

f (individual , model = "iid", values = l:n, hyper = list(prec = list( 
initial = log (prec .tau) , 
param = prior .tau 

) 

) 

) 

The call of the inla function includes the specification of the family, which is 
Poisson here and thus includes no additional hyperparameters. The prior distri- 
butions for the intercept /?o and the slope j3 z are specified in the control . fixed 
option. 

r <- inla (formula, data = data. frame (y, w, z, individual) , 
family = "poisson" , 
control . fixed = list( 

mean . intercept = prior .beta[l] , 
prec . intercept = prior .beta [2] , 
mean = prior . beta [1] , 
prec = prior .beta [2] ) , 

) 

summary (r) 



Analysis with the copy feature 

As described in the main text, as an alternative to the use of the new meb model, 
the same results can be obtained by employing the copy feature in INLA. The 
approach is similar to the one taken in Section 7.2. 

The object data now contains an additional fourth column: 

y w z sh 

Column sh contains the values shi, . . ., sh n , where shj is the index of the 
shadehouse where seedling i was grown, i = 1, . . . , 15. As the error model in this 
example is Berkson, the joint model simplifies to two equations and the response 
matrix has only two columns. The model can be represented as 
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Note that there are n = 60 seedlings distributed over s = 15 shadehouses, 
whereas always five shadehouses belong to the same light condition (dark, mid- 
dle, light). There are thus 15 different correct light intensities (x, one value per 
shadehouse), but only 3 different target light intensities (w, one value per light 
condition). Terms below the brackets correspond to the names in the R-code. 
Let us start again with prior specification process in accordance to the main text: 

attach (data) 

w.red <- aggregate (w, by = list(sh) , FUN = mean)[,2] 
n <- 60 # number of seedlings 

s <- 15 # number of shadehouses 

prec.u <- 4.98 # estimated value 

prec. tau <- 57.83 # estimated value 
prior. beta <- c (0,0. 0001) 

prior .prec .u <- c(100 ,99/prec .u) # prior mode at prec.u 
prior .tau <- c (1 00, 99/prec. tau) # prior mode at prec. tau 

The aggregate command in the second line aggregates the vector w of length 
n = 60 into the 15 (one per shadehouse) unique light values. 
Next, the response matrix Y and the data vectors are filled according to the 
naming of Equation (15): 

Y <- matrix (NA, n+s, 2) 
Y[l:n, 1] <- y 

Y[n+(l:s), 2] < w.red 

beta.O <- c(rep(l, n) , rep(NA, s)) 

beta.x <- c(sh, rep(NA, s)) 

idx.x <- c(rep(NA, n) , l:s) 

weight. x <- c(rep(NA, n) , -rep(l, s)) 

beta.z <- c(z, rep(NA, s)) 

gamma <- c(l:n, rep(NA, s)) 

data. joint <- data. frame (Y, beta.O, beta.x, idx.x, weight .x, beta.z, gamma) 

The definition of the INLA formula is almost analogous to the one in Sec- 
tion 7.2. The main difference is the additional i.i.d. random effects term per 
individual jijk, specified in f (gamma, . . . ) , which extends the GLM to a GLMM: 

library (INLA) 

formula <- Y ~ beta.O - 1 + 
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f (beta.x, copy = "idx.x", 

hyper = list (beta = list(param = prior .beta, fixed = FALSE))) + 
f (idx.x, weight .x, model = "iid", values = l:s, 

hyper = list(prec = list (initial = -15, fixed = TRUE))) + 
beta.z + 

f (gamma, model = "iid", values = l:n, 

hyper = list(prec = list(initial = log(prec.tau) , param = prior . tau))) 

As in Section 7.2 we have to explicitly remove the common intercept using -1. 
The call of the INLA function is as well in analogy to Section 7.2, but there are 
only two likelihoods involved here: the Poisson likelihood for the regression model 
and the Gaussian likelihood for the error model. The former has no additional 
hyperparameters, while in the latter the error precision t u needs specification. 

r <- inla (formula, data = data. joint, 

family = c("poisson" , "gaussian") , 
control . family = list( 

list (hyper = list()) , 
list (hyper = list( 

prec = list(initial=log(prec.u) , 
param = prior .prec .u, 
fixed = FALSE)))), 
control . fixed = list( 

mean. intercept = prior .beta[l] , 
prec . intercept = prior .beta [2] , 
mean = prior .beta[l] , 
prec = prior .beta [2] ) 

) 

summary (r) 



MEASUREMENT ERROR IN GLMMS WITH INLA 



31 



8. SUPPLEMENTS TO SECTION 5.2 IN THE MAIN TEXT 

8.1 Full conditionals for the MCMC sampler 

Let all variables be defined as in Section 5.2 of the main text. The full condi- 
tionals for the unknowns in the regression model are given as follows: 
For (3 = (/So, Px, Pz) T we have 

(3 | rest oc ir(y \ x, z) ■ vr(/3) 

oc exp(x>^-X>g(l + ^-^V) , 



where ??i = A) + Px x i + Pz z i an d r a/ g = 10 4 . The a = (ao,a z ) T coefficients 
can be sampled from a Gaussian distribution. Let D bet the matrix with rows 
Dj : = (1 zj) and A := T a p/r x . Then 

a | rest oc ir(x \ rest) • n(a) 

oc exp ^— -^(x — Da) T (x — Da) — T ^-cx T o^j 

~ Af((D T D + AI)- 1 D t x,t x (D t D + AI)) , 

where the second argument in the last expression is again the precision. To sample 
from the distribution of the latent variable x, full conditionals for X{ are needed: 

Xi | rest oc 7r(yj | Xi, z{) ■ w(wi \ xi) ■ ir(xi \ Zi) 

oc exp (yifji - log(l + e Vi ) - y(wj - Xi) 2 - y(x; - a - a 2 ^) 2 ^) • 

Finally, the precisions can be sampled from gamma distributions 
t x | rest oc 7r(a: | z) • tt(t x ) 

~ G^ + ^A + ^-^cO^-^c*)) , 



and 



r u | rest oc 7r(io | a;) • 7r(r u ) 

~ G (a u + |,6 U + ^{w - x) T (w - x)j 
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8.2 MCMC and INLA posterior marginals 




Fig 9. Comparison of the MCMC samples (histograms) with the INLA posterior marginals 
(lines) for the Framingham data. 



